Goto

Collaborating Authors

 implementation and application


Google Study Shows Transformer Modifications Fail To Transfer Across Implementations and Applications

#artificialintelligence

Since their introduction three years ago, transformer architectures have become the de-facto standard for natural language processing (NLP) tasks and are now also seeing application in areas such as computer vision. Although many transformer architecture modifications have been proposed, these have not proven as easily transferable across implementations and applications as hoped, and that has limited their wider adoption. In a bid to understand why most widely-used transformer applications shun these modifications, a team from Google Research comprehensively evaluated them in a shared experimental setting, where they were surprised to discover that most architecture modifications they looked at do not meaningfully improve performance on downstream NLP tasks. The researchers began by reimplementing and evaluating a variety of transformer variants on the tasks where they are most commonly applied. As a baseline, they used the original transformer model with two modifications: applying layer normalization before the self-attention and feedforward blocks instead of after, and using relative attention with shared biases instead of sinusoidal positional embeddings. The researchers employed two experimental settings to evaluate each modification's performance: transfer learning based on T5, and supervised machine translation on the WMT'14 English-German translation task.